Intro

Goal: To develop a model for predicting life expectancy in Baltimore down to single block resolution with estimates of uncertainty. You may need to develop an approach for “downsampling” since the outcome data you’ll be able to find is likely aggregated at the neighborhood level

Data

We have data from Baltimore city website, Baltimore Neighborhood Indicators Alliance BNIA-JF, and from the Maryland department of planning. The data consists of information about life expectancy estimates for each neighbourhood, along with crime, economic development and education informmation, all over a 5 year period (2010-2014). I also have street level, and thus block level data. In addition I have information which links streets to blocks and then to neighbourhood.

Descriptives

Since the goal of this analysis is to predict life expectancy at the street block level and since the block information conatined in my dataset does not Since some of the data files have information on neighbourhood blocks, I plotted the Neighbourhood information as defined or delineated by the block level data gotten from the Baltimore city website and then overlayed the neighbourhood data gotten from the Maryland department of planning. Futhermore, using information from the Baltimore gisdata website I was able to obtain what “block” was actually defined as. All of this points to the possiblity of using blocks from our dataset as street blocks.

For more plots examining the fits please visit my github repo

All of this indicate a good fit. I also used gis data from the baltimore city website and I found that each block was defined as a street block.An example of a cityblock pulled from dataset

Analysis

Since we have spatial data I ran the both Mantel test(c.f Mantel 1966) and Moran’s I (c.f Moran 1950) to examine if spatial autocorrelation exists in this dataset. Please note that while both test measure spatial autocorrelation, they refer to quite different concepts

Moran’s I(Moran 1950) is useful when one wants to know the correlation of a variable with itself through space. I.e., when one wants to know to which extent the occurrence of an event in an areal unit makes it more likely or unlikely the occurrence of an event in a neighboring areal unit.

Mantel’s test(Mantel 1966; Dutilleul et al. 2000) gives correlation between different variables due to their spatial location, that is Mantel’s test judges whether closeness in one set of variables is related to closeness in another set of variable. Relating this to our datasetwe can use it to see if samples that are close in terms of their geographic location values are also close in terms of life expectancy values.

Datasets

Name Information Type Data Source Geographic Scale Date
Real Property Taxes Contains information on which streets belong to which block and in what neighbourhood along with their longitude and latitude. Also has information on police district. Dataset Baltimore city website Street Level 2016
Real Property Contains the City of Baltimore parcel boundaries, with ownership, address, valuation and other property information. Furthermore, it also contains street block definitions. Dataset Baltimore gisdata website Street level 2016
Census Block GIS shapefile which has information on census block designation for 2010 Shapefile Maryland department of planning Block level 2010
Neighborhoood Polygon feature representing the boundaries of Baltimore City’s neighborhoods as of the year 2010 Shapefile Baltimore city website Neighborhood level 2010
Census Demographics for 2010 to 2014 Contains neighborhood level demographics data Dataset Baltimore Neighborhood Indicators Alliance BNIA-JF Neighborhood level 2010 - 2014
Children and Family Health & Well-Being Has information on life expectancy for 2010 to 2014 Dataset Baltimore Neighborhood Indicators Alliance BNIA-JF Neighborhood level 2010 - 2014
BNIA Vital Signs Codebook Contain information on short variable names and their corresponding full names, along with their sources for each dataset Dataset Baltimore city website Neighborhood level 2016
Housing and Community Development Has information on the state of households in Baltimore city, viz;Number of Homes Sold,Percentage of Residential Properties that are Vacant and Abandoned,Percent Residential Properties that do Not Receive Mail, etc. Dataset Baltimore Neighborhood Indicators Alliance BNIA-JF Neighborhood level 2010-2014
devtools::session_info()
## Session info --------------------------------------------------------------
##  setting  value                       
##  version  R version 3.3.1 (2016-06-21)
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  English_United States.1252  
##  tz       America/New_York            
##  date     2016-09-26
## Packages ------------------------------------------------------------------
##  package       * version date       source        
##  animation     * 2.4     2015-08-16 CRAN (R 3.3.1)
##  assertthat      0.1     2013-12-06 CRAN (R 3.3.1)
##  broom         * 0.4.1   2016-06-24 CRAN (R 3.3.1)
##  colorspace      1.2-6   2015-03-11 CRAN (R 3.3.1)
##  DBI             0.4-1   2016-05-08 CRAN (R 3.3.1)
##  devtools      * 1.12.0  2016-06-24 CRAN (R 3.3.1)
##  digest          0.6.9   2016-01-08 CRAN (R 3.3.1)
##  downloader    * 0.4     2015-07-09 CRAN (R 3.3.1)
##  dplyr           0.5.0   2016-06-24 CRAN (R 3.3.1)
##  evaluate        0.9     2016-04-29 CRAN (R 3.3.1)
##  foreign         0.8-66  2015-08-19 CRAN (R 3.3.0)
##  formatR         1.4     2016-05-09 CRAN (R 3.3.1)
##  geosphere       1.5-5   2016-06-15 CRAN (R 3.3.1)
##  ggmap         * 2.6.1   2016-01-23 CRAN (R 3.3.1)
##  ggplot2       * 2.1.0   2016-03-01 CRAN (R 3.3.1)
##  gtable          0.2.0   2016-02-26 CRAN (R 3.3.1)
##  htmltools       0.3.5   2016-03-21 CRAN (R 3.3.1)
##  jpeg            0.1-8   2014-01-23 CRAN (R 3.3.0)
##  knitr           1.13    2016-05-09 CRAN (R 3.3.1)
##  lattice         0.20-33 2015-07-14 CRAN (R 3.3.1)
##  lubridate     * 1.5.6   2016-04-06 CRAN (R 3.3.1)
##  magrittr        1.5     2014-11-22 CRAN (R 3.3.1)
##  mapproj         1.2-4   2015-08-03 CRAN (R 3.3.1)
##  maps            3.1.0   2016-02-13 CRAN (R 3.3.1)
##  maptools      * 0.8-39  2016-01-30 CRAN (R 3.3.1)
##  memoise         1.0.0   2016-01-29 CRAN (R 3.3.1)
##  mnormt          1.5-4   2016-03-09 CRAN (R 3.3.0)
##  munsell         0.4.3   2016-02-13 CRAN (R 3.3.1)
##  nlme            3.1-128 2016-05-10 CRAN (R 3.3.1)
##  plyr            1.8.4   2016-06-08 CRAN (R 3.3.1)
##  png             0.1-7   2013-12-03 CRAN (R 3.3.0)
##  proto           0.3-10  2012-12-22 CRAN (R 3.3.0)
##  psych           1.6.6   2016-06-28 CRAN (R 3.3.1)
##  R6              2.1.2   2016-01-26 CRAN (R 3.3.1)
##  RColorBrewer  * 1.1-2   2014-12-07 CRAN (R 3.3.0)
##  Rcpp            0.12.5  2016-05-14 CRAN (R 3.3.1)
##  readr         * 0.2.2   2015-10-22 CRAN (R 3.3.1)
##  readxl        * 0.1.1   2016-03-28 CRAN (R 3.3.1)
##  reshape2        1.4.1   2014-12-06 CRAN (R 3.3.1)
##  RevoUtils       10.0.1  2016-08-24 local         
##  RevoUtilsMath * 8.0.3   2016-04-13 local         
##  rgdal         * 1.1-10  2016-05-12 CRAN (R 3.3.1)
##  rgeos         * 0.3-19  2016-04-04 CRAN (R 3.3.1)
##  RgoogleMaps     1.2.0.7 2015-01-21 CRAN (R 3.3.1)
##  rjson           0.2.15  2014-11-03 CRAN (R 3.3.0)
##  RJSONIO         1.3-0   2014-07-28 CRAN (R 3.3.0)
##  rmarkdown       0.9.6   2016-05-01 CRAN (R 3.3.1)
##  scales          0.4.0   2016-02-26 CRAN (R 3.3.1)
##  sp            * 1.2-3   2016-04-14 CRAN (R 3.3.1)
##  stringi         1.1.1   2016-05-27 CRAN (R 3.3.0)
##  stringr         1.0.0   2015-04-30 CRAN (R 3.3.1)
##  tibble          1.0     2016-03-23 CRAN (R 3.3.0)
##  tidyr           0.5.1   2016-06-14 CRAN (R 3.3.1)
##  withr           1.0.2   2016-06-20 CRAN (R 3.3.1)
##  yaml            2.1.13  2014-06-12 CRAN (R 3.3.1)

References

Dutilleul, Pierre, Jason Stockwell, Dominic Frigon, and Pierre Legendre. 2000. “The Mantel Test Versus Pearson’s Correlation Analysis: Assessment of the Differences for Biological and Environmental Studies.” Journal of Agricultural, Biological, and Environmental Statistics 5 (June). International Biometric Society: 131–50. http://www.jstor.org/stable/1400528.

Mantel, Nathan. 1966. “The Detection of Disease Clustering and a Generalized Regression Approach.” American Association for Cancer Research., September.

Moran, Patrick Alfred Pierce. 1950. “Notes on Continuous Stochastic Phenomena.” Biometrika 37 (June). Oxford University Press on behalf of Biometrika Trust: 17–23. http://www.jstor.org/stable/2332142.